Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros










Base de dados
Tipo de estudo
Intervalo de ano de publicação
1.
IEEE Trans Cybern ; PP2023 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-37796676

RESUMO

In recent years, the application of function approximators, such as neural networks and polynomials, has ushered in a new stage of development in solving optimal control problems. However, considering the existence of approximation errors, the stability of the controlled system cannot be guaranteed. Therefore, in view of the prevalence of approximation errors, we investigate optimal tracking control problems for discrete-time systems. First, a novel value function is introduced into the intelligent critic framework. Second, an implicit method is utilized to demonstrate the boundedness of the iterative value functions with approximation errors. An explicit method is applied to prove the stability of the system with approximation errors. Furthermore, an evolving policy is designed to iteratively tackle the optimal tracking control problem and demonstrate the stability of the system. Finally, the effectiveness of the developed method is verified through numerical as well as practical examples.

2.
IEEE Trans Cybern ; PP2023 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-37021869

RESUMO

Inspired by the successive relaxation method, a novel discounted iterative adaptive dynamic programming framework is developed, in which the iterative value function sequence possesses an adjustable convergence rate. The different convergence properties of the value function sequence and the stability of the closed-loop systems under the new discounted value iteration (VI) are investigated. Based on the properties of the given VI scheme, an accelerated learning algorithm with convergence guarantee is presented. Moreover, the implementations of the new VI scheme and its accelerated learning design are elaborated, which involve value function approximation and policy improvement. A nonlinear fourth-order ball-and-beam balancing plant is used to verify the performance of the developed approaches. Compared with the traditional VI, the present discounted iterative adaptive critic designs greatly accelerate the convergence rate of the value function and reduce the computational cost simultaneously.

3.
Artigo em Inglês | MEDLINE | ID: mdl-37027589

RESUMO

In this article, the generalized N -step value gradient learning (GNSVGL) algorithm, which takes a long-term prediction parameter λ into account, is developed for infinite horizon discounted near-optimal control of discrete-time nonlinear systems. The proposed GNSVGL algorithm can accelerate the learning process of adaptive dynamic programming (ADP) and has a better performance by learning from more than one future reward. Compared with the traditional N -step value gradient learning (NSVGL) algorithm with zero initial functions, the proposed GNSVGL algorithm is initialized with positive definite functions. Considering different initial cost functions, the convergence analysis of the value-iteration-based algorithm is provided. The stability condition for the iterative control policy is established to determine the value of the iteration index, under which the control law can make the system asymptotically stable. Under such a condition, if the system is asymptotically stable at the current iteration, then the iterative control laws after this step are guaranteed to be stabilizing. Two critic neural networks and one action network are constructed to approximate the one-return costate function, the λ -return costate function, and the control law, respectively. It is emphasized that one-return and λ -return critic networks are combined to train the action neural network. Finally, via conducting simulation studies and comparisons, the superiority of the developed algorithm is confirmed.

4.
IEEE Trans Neural Netw Learn Syst ; 34(10): 7430-7442, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35089866

RESUMO

In this article, a novel value iteration scheme is developed with convergence and stability discussions. A relaxation factor is introduced to adjust the convergence rate of the value function sequence. The convergence conditions with respect to the relaxation factor are given. The stability of the closed-loop system using the control policies generated by the present VI algorithm is investigated. Moreover, an integrated VI approach is developed to accelerate and guarantee the convergence by combining the advantages of the present and traditional value iterations. Also, a relaxation function is designed to adaptively make the developed value iteration scheme possess fast convergence property. Finally, the theoretical results and the effectiveness of the present algorithm are validated by numerical examples.

5.
IEEE Trans Neural Netw Learn Syst ; 34(11): 8707-8718, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-35239493

RESUMO

In this article, the general value iteration (GVI) algorithm for discrete-time zero-sum games is investigated. The theoretical analysis focuses on stability properties of the systems and also the admissibility properties of the iterative policy pair. A new criterion is established to determine the admissibility of the current policy pair. Besides, based on the admissibility criterion, the improved GVI algorithm toward zero-sum games is developed to guarantee that all iterative policy pairs are admissible if the current policy pair satisfies the criterion. On the basis of the attraction domain, we demonstrate that the state trajectory will stay in the region using the fixed or the evolving policy pair if the initial state belongs to the domain. It is emphasized that the evolving policy pair can stabilize the controlled system. These theoretical results are applied to linear and nonlinear systems via offline and online critic control design.

6.
IEEE Trans Neural Netw Learn Syst ; 34(9): 6504-6514, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34986105

RESUMO

For discounted optimal regulation design, the stability of the controlled system is affected by the discount factor. If an inappropriate discount factor is employed, the optimal control policy might be unstabilizing. Therefore, in this article, the effect of the discount factor on the stabilization of control strategies is discussed. We develop the system stability criterion and the selection rules of the discount factor with respect to the linear quadratic regulator problem under the general discounted value iteration algorithm. Based on the monotonicity of the value function sequence, the method to judge the stability of the controlled system is established during the iteration process. In addition, once some stability conditions are satisfied at a certain iteration step, all control policies after this iteration step are stabilizing. Furthermore, combined with the undiscounted optimal control problem, the practical rule of how to select an appropriate discount factor is constructed. Finally, several simulation examples with physical backgrounds are conducted to demonstrate the present theoretical results.

7.
IEEE Trans Neural Netw Learn Syst ; 34(8): 4237-4248, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34752409

RESUMO

In this article, a novel neuro-optimal tracking control approach is developed toward discrete-time nonlinear systems. By constructing a new augmented plant, the optimal trajectory tracking design is transformed into an optimal regulation problem. For discrete-time nonlinear dynamics, the steady control input corresponding to the reference trajectory is given. Then, the value-iteration-based tracking control algorithm is provided and the convergence of the value function sequence is established. Therein, the approximation error between the iterative value function and the optimal cost is estimated. The uniformly ultimately bounded stability of the closed-loop system is also discussed in detail. Moreover, the iterative heuristic dynamic programming (HDP) algorithm is implemented by involving the critic and action components, where some new updating rules of the action network are provided. Finally, two examples are used to demonstrate the optimality of the present controller as well as the effectiveness of the proposed method.

8.
IEEE Trans Cybern ; 53(7): 4487-4499, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-36063514

RESUMO

In this article, evolving and incremental value iteration (VI) frameworks are constructed to address the discrete-time zero-sum game problem. First, the evolving scheme means that the closed-loop system is regulated by using the evolving policy pair. During the control stage, we are committed to establishing the stability criterion in order to guarantee the availability of evolving policy pairs. Second, a novel incremental VI algorithm, which takes the historical information of the iterative process into account, is developed to solve the regulation and tracking problems for the nonlinear zero-sum game. Via introducing different incremental factors, it is highlighted that we can adjust the convergence rate of the iterative cost function sequence. Finally, two simulation examples, including linear and nonlinear systems, are conducted to demonstrate the performance and the validity of the proposed evolving and incremental VI schemes.


Assuntos
Redes Neurais de Computação , Dinâmica não Linear , Algoritmos , Simulação por Computador
9.
Artigo em Inglês | MEDLINE | ID: mdl-37015365

RESUMO

In this article, to solve the optimal tracking control problem (OTCP) for discrete-time (DT) nonlinear systems, general value iteration (GVI) scheme and online value iteration (VI) algorithms with novel value function are discussed. First, the disadvantage of the traditional value function for the OTCP is presented and the novel value function is introduced. Second, we analyze the monotonicity and convergence of GVI and establish the admissibility condition of GVI to evaluate the admissibility of the current iterative control. Note that a novel approach is introduced to analyze the admissibility. Third, based on the attraction domain, improved control policies with online VI can be obtained by judging the location of the current tracking error and reference point. Finally, the stability of the online VI-based control system is guaranteed. Besides, we provide two simulation examples to show the performance of the proposed methods.

10.
IEEE Trans Cybern ; 52(12): 13262-13274, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34516384

RESUMO

This article is concerned with the stability of the closed-loop system using various control policies generated by value iteration. Some stability properties involving admissibility criteria, the attraction domain, and so forth, are investigated. An offline integrated value iteration (VI) scheme with a stability guarantee is developed by combining the advantages of VI and policy iteration, which is convenient to obtain admissible control policies. Also, based on the concept of attraction domain, an online adaptive dynamic programming algorithm using immature control policies is developed. Remarkably, it is ensured that the state trajectory under the online algorithm converges to the origin. Particularly, for linear systems, the online ADP algorithm with a general scheme possesses more enhanced stability property. The theoretical results reveal that the stability of the linear system can be guaranteed even if the control policy sequence includes finite unstable elements. The numerical results verify the effectiveness of the present algorithms.

11.
Neural Netw ; 144: 176-186, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34500256

RESUMO

A data-based value iteration algorithm with the bidirectional approximation feature is developed for discounted optimal control. The unknown nonlinear system dynamics is first identified by establishing a model neural network. To improve the identification precision, biases are introduced to the model network. The model network with biases is trained by the gradient descent algorithm, where the weights and biases across all layers are updated. The uniform ultimate boundedness stability with a proper learning rate is analyzed, by using the Lyapunov approach. Moreover, an integrated value iteration with the discounted cost is developed to fully guarantee the approximation accuracy of the optimal value function. Then, the effectiveness of the proposed algorithm is demonstrated by carrying out two simulation examples with physical backgrounds.


Assuntos
Redes Neurais de Computação , Dinâmica não Linear , Algoritmos , Simulação por Computador , Aprendizagem
12.
Neural Netw ; 143: 121-132, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34118779

RESUMO

In this paper, we aim to solve the optimal tracking control problem for a class of nonaffine discrete-time systems with actuator saturation. First, a data-based neural identifier is constructed to learn the unknown system dynamics. Then, according to the expression of the trained neural identifier, we can obtain the steady control corresponding to the reference trajectory. Next, by involving the iterative dual heuristic dynamic programming algorithm, the new costate function and the tracking control law are developed. Two other neural networks are used to estimate the costate function and approximate the tracking control law. Considering approximation errors of neural networks, the stability analysis of the proposed algorithm for the specific systems is provided by introducing the Lyapunov approach. Finally, via conducting simulation and comparison, the superiority of the developed optimal tracking method is confirmed. Moreover, the trajectory tracking performance of the wastewater treatment application is also involved for further verifying the proposed approach.


Assuntos
Dinâmica não Linear , Purificação da Água , Algoritmos , Retroalimentação , Redes Neurais de Computação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...